We present a content-based automatic music tagging algorithm\nusing fully convolutional neural networks (FCNs).\nWe evaluate different architectures consisting of 2D convolutional\nlayers and subsampling layers only. In the experiments,\nwe measure the AUC-ROC scores of the architectures\nwith different complexities and input types using\nthe MagnaTagATune dataset, where a 4-layer architecture\nshows state-of-the-art performance with mel-spectrogram\ninput. Furthermore, we evaluated the performances of the\narchitectures with varying the number of layers on a larger\ndataset (Million Song Dataset), and found that deeper models\noutperformed the 4-layer architecture. The experiments\nshow that mel-spectrogram is an effective time-frequency\nrepresentation for automatic tagging and that more complex\nmodels benefit from more training data.
Loading....